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THE BERNSTEIN VON MISES THEOREM FOR THE 
PROPORTIONAL HAZARD MODEL^ 

By Yongdai Kim 

Seoul National University 

We study large sample properties of Bayesian analysis of the pro- 
portional hazard model with neutral to the right process priors on 
the baseline hazard function. We show that the posterior distribution 
of the baseline cumulative hazard function and regression coefficients 
centered at the maximum likelihood estimator is jointly asymptoti- 
cally equivalent to the sampling distribution of the maximum likeli- 
hood estimator. 

1. Introduction. Since Cox [3] proposed the proportional hazard model 
for survival time data in the presence of covariates, the proportional hazard 
model has enjoyed a wide variety of applications in biomedical data analysis 
and reliability. Although it does not require any parametric assumption on 
the baseline cumulative hazard function (c.h.f.), its computation is almost 
parametric. By casting the theoretical framework as a counting process prob- 
lem, the study of its asymptotic properties becomes a historical success story 
in theoretical statistics. These are some of many reasons for its popularity 
in applications as well as the theory of statistics. 

The Bayesian analysis of the proportional hazard model has also been 
studied by many authors. Kalbfleisch [11] studied its Bayesian analysis with 
gamma process priors on the baseline c.h.f. For the Bayesian analysis of 
the proportional hazard model with beta process priors, a Markov chain 
Monte Carlo computation is proposed by Laud, Damien and Smith [17] and 
Lee and Kim [18], and the marginal posterior distribution of the regression 
coefficients is obtained by Hjort [10]. Kim and Lee [14] obtained the posterior 
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distribution for the proportional hazard model with neutral to the right 
process priors [5] when the survival times are under left truncation and right 
censoring. Most research effort from the Bayesian side has been devoted to 
identifying the posterior distribution and its computation, but asymptotic 
properties of the proportional hazard model have not been studied. 

The asymptotic properties of the posterior are, however, an important 
theoretical issue in nonparametric Bayesian models, for there are many un- 
expected phenomena reported in the literature. For example, Diaconis and 
Freedman [4] showed that nonparametric posteriors could have inconsistency 
even with reasonable priors. They argued further that the inconsistency of 
the posterior in nonparametric problems is a rule, not an exception. The 
related work on this issue includes Ghosal, Ghosh and Ramamoorthi [8] and 
Barron, Schervish and Wasserman [1]. For right-censored data, Kim and Lee 
[13] showed that not all neutral to the right prior processes have consistent 
posteriors and gave sufficient conditions for consistency. 

This unfortunate phenomenon continues to occur in the posterior conver- 
gence rate. See [2, 9, 23, 25]. These examples cast doubt on the Bernstein- 
von Mises theorem in nonparametric models, which states that the posterior 
distribution centered at the maximum likelihood estimator is asymptotically 
equivalent to the sampling distribution of the maximum likelihood estima- 
tor. See also [7]. In contrast, however, Shen [22] proved that even in semi- 
parametric/nonparametric models, if the parameter of interest is of finite 
dimension, one does not need to worry because the Bernstein-von Mises 
theorem holds for finite-dimensional parameters. 

If the Bernstein-von Mises theorem does not hold, it often implies the 
Bayesian credible set has zero efficiency relative to the frequentist confidence 
interval. The validity of the Bernstein-von Mises theorem also has an im- 
portant implication in practice, because the Bernstein-von Mises theorem 
warrants use of Bayesian credible sets as frequentist confidence intervals 
asymptotically. Kim and Lee [16] studied the Bernstein-von Mises theorem 
for right-censored survival data without covariates. They found that for any 
< a < 1/2 there is a consistent prior process neutral to the right whose 
posterior convergence rate is exactly n~" and also showed that for popu- 
lar prior processes such as beta, gamma and Dirichlet processes, indeed the 
Bernstein-von Mises theorem does hold. 

In this paper we prove the Bernstein-von Mises theorem for Bayesian 
analysis of the proportional hazard model. The proof consists of the two 
Bernstein-von Mises theorems: one for the marginal posterior distribution 
of the regression coefficients and the other for the conditional posterior dis- 
tribution of the baseline cumulative hazard functions given the regression 
coefficients. These two Bernstein-von Mises theorems together yield the 
Bernstein-von Mises theorem of the joint posterior distribution of the regres- 
sion coefficients and the baseline cumulative hazard function. The main idea 
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of the proof of the Bernstein-von Mises theorem of the marginal posterior 
distribution of the regression coefficients is to show that (i) on 1 j \fn neigh- 
borhoods of the true value, the posterior density converges to the targeted 
normal density with respect to the L\ norm and (ii) outside '\-/\/n neigh- 
borhoods of the true value, the posterior mass vanishes eventually. For (i), 
we approximate the posterior distribution with the product of the partial 
likelihood and prior, and show that the product of the partial likelihood 
and prior converges to the target normal distribution with respect to the 
Li norm. The proof of (ii) is the harder part since the posterior distribu- 
tion is not log-concave. For (ii), we use a sequence of log-concave functions 
which dominate the posterior distribution and whose total masses vanish 
eventually outside neighborhoods of the true value. The proof of the 

Bernstein-von Mises theorem for the baseline cumulative hazard function 
given the regression coefficients exploits the functional central limit theo- 
rem for independent increment (II) processes (Theorem 19 of Section V.4 
in [20] ) , for the conditional posterior distribution of the baseline cumulative 
hazard function given the regression coefficients is an II process. 

The paper is organized as follows. In Section 2 prior processes neutral to 
the right are reviewed briefly and the posterior distribution of the regression 
coefficients and the baseline hazard function is given. In Section 3 the main 
results are stated and examples are given. Section 4 proves the main results 
with key lemmas, whose proofs are presented in the Appendix. 

2. Neutral to the right processes as priors. The postulation of the pro- 
portional hazard model is as follows. Let Xi, . . . ,Xn be survival times with 
covariates Zi, . . . , where Zi G W , i = 1, . . . ,n. Suppose the distribution 
Fi of Xi with covariate Zi is given by 

l-i^,(t) = (l-F(t))'="P(^^^') 

for an unknown regression parameter (3 G and where F is an unknown dis- 
tribution of a survival time with covariate being 0. In most applications, the 
survival times are subject to right censoring, that is, (Ti, 5i, Zi), . . . , (T„, 5„, Zn) 
are observed, where Tj = min(Ci, Tj), Si = I{Xi < Ci) and Ci, . . . , C„ are in- 
dependent random variables with the common distribution function G. 

In the proportional hazard model, there are two parameters: the regression 
coefficients /3 and the baseline distribution function F. For prior distribu- 
tions, we take a process neutral to the right [5] for F and a usual parametric 
prior distribution for [3. Processes neutral to the right include many popu- 
lar prior processes such as Dirichlet processes, gamma processes and beta 
processes. 

We say that a prior process on the c.d.f. F is a process neutral to the 
right if the corresponding c.h.f. A is a nondecr easing independent increment 
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(Nil) process such that A{0) = 0, < AA{t) < 1 for all t with probability 1 
and either AA{t) = 1 for some t > or lirtif^oo ^{t) = oo with probability 1. 
See [5] for the original definition of processes neutral to the right and see [10, 
12, 13] for the connection between the definition given here and Doksum's 
definition. Prom what follows, the term Nil process is used for a prior process 
of the c.h.f. A which induces a process neutral to the right on F. 
The Levy measure v of an Nil process A is defined by 

K[0,t]xi?)=E( I{^A{s)€B\{0}) 

Vs6[0,t] 

where t>0 and B is a Borel subset of [0, 1]. Conversely, for any cr-finite mea- 
sure u defined on [0, oo) x [0, 1] which satisfies, for all t> 0, /q Jq xv{ds^ dx) < 
oo, there exists a unique Nil process whose Levy measure is v. Hence, any 
Nil process can be characterized by its Levy measure. 

The mean and variance of an Nil process A{t) with Levy measure u can 
be conveniently calculated by the formulas 

(1) E(^(t)) = /* [\u{ds,dx) 

Jo Jo 

and 

(2) Yai{A{t)) = /* x^u{ds, dx) - V f xiy{{s}, dx) 

Jo Jo J<t^"'0 

These formulas constitute basic facts for the asymptotic theory of the pos- 
terior and will be used subsequently in this paper. 

Let Qn be the number of distinct uncensored observations and let ti < 
t2 < • • • < tg„ be the ordered distinct uncensored observations. Define two 
sets Dn{t) and Rn{t) by 

Dn{t) = {i:Ti=t,5i = l,i = l,. . . ,n} 

and 

Rn{t) = {i:Ti > t,i = 1, . . . ,n}. 

Let R+{t) = Rn{t)-Dn{t). 

A priori, let the baseline c.d.f. F be a process neutral to the right such 
that the corresponding c.h.f. A is an Nil process with a Levy measure v of 
the form 

(3) v{dt,dx) = ft{x) dx dt 

for X G [0, 1], and let 7r(/3) be the prior density function for /3. Without loss 
of generality, we assume that Ti < Ti < • • • < T„. The next theorem provides 
the posterior distribution of /3 as well as A. The proof is in [14]. 
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Theorem 2.1. Let Dn = {{Ti,5uZi),...,{Tn,6n,Zn)). 

(i) Conditional on (3 and Dn, the posterior distribution of F is a process 
neutral to the right with Levy measure 

u(dt,dx\p,Dn) = (1 -x)^^efl„(t)'=^P(/'^^i)/^(a;)d2;(it 
+ Y,dHni{x\P)5t,{dt), 

i=l 

where 6a is the point measure at a and Hni{-\(3) is the probability measure 
defined on [0, 1] with density 



hni{x\f3) oc 

(5) 



(ii) The marginal posterior distribution of (3 is 
(6) 7r(/3|Z)„)ae^''"(^)fT f\ni{x\(])dx7T{P), 

where 

p„(/3) = y f\l-{l-xf^P^^^^^^){l-x)^^='+^''''^^^^^''^ft{x)dxdt, 
f^^Jo Jo 

for j = 1, . . . ,n and J2]'=i+i exp{f3'^ Zj) = when i = n. 

3. Main result. Let /3o and Fq be the true values of the parameters where 
Xi , . . . , Xn are generated and let Aq be the cumulative hazard function of Fq . 
In this section we present the Bernstein-von Mises theorem of the posterior 
distribution of {(5, A). That is, we show that asymptotically the posterior 
distribution of A) centered at the maximum partial likelihood estimator 
(MLE) {(3., A) is the same as the asymptotic distribution of the MLE itself. 

The following conditions are assumed to hold in the remainder of this 
section: 

(Al) ^0 is absolutely continuous. 

(A2) For a positive constant r, Fq{t) < 1, G(r— ) < 1 and G{t) = 1. 
(A3) Zi,...,Zn are i.i.d. p-dimensional random vectors such that ||2^i|| < 
Mz < oo with probability 1 for some constant where 

p 

\\Zi\\ = ^ \Zii\. 

i=l 
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(A4) If Pr(c'Zi = 0) = 1, then c = 0. 

(A5) 7r(/3) is continuous at /3o with 7r(/3o) > 0. 

Condition (Al) prevents ties. If has a finite number of discontinuity 
points, then the proof can be done separately on the continuous part and 
the discrete part. Condition (A2) assumes that some patients remain in the 
study until time r, which is necessary to recover the information of A{t) 
on [0, r]. If condition (A2) holds for all r, the Bernstein-von Mises theo- 
rem for A holds on [0, oo). But, note that even if r < oo, the Bernstein-von 
Mises theorem for (3 holds as long as r > 0. Condition (A3) is for technical 
purposes, and condition (A4) is to avoid collinearity among the covariates. 
Condition (A5) is a standard assumption for Bernstein-von Mises type re- 
sults. 

Let P be the maximum (partial) likelihood estimator which maximizes 
the partial likelihood 

r (B) = T\ n exp(/?^Zj) 

Let 

dN.{s) 



A{t) = [ 
Jo 



'0 E»efl„{s) exp(/3^^i) ' 

where N.{t) = J27=i Ni{t) and Ni{t) = I(Ti <t,6i = l). In fact, A is Breslow's 
estimator of the baseline hazard function [3] . We introduce the notation 



50(s:/3o)' 



Uo{t) 



Jo S^'{s:Po) 
5°(t:/3)=E(exp(/3^Zi)/(ri>t)), 
S\t:(3)=E{Ziexpi(3^Zi)I{Ti>t)), 

Ii(3)= rV{t:[3)S\t:f3)dAo{t), 



V{t:(3) = S\t:[3)/S\t:[3)-eo{t)\ 
S'^it: /3) = E{ZiZf e^piP"^ Zi)I{Ti >t)). 
Assume that a priori A is an Nil process with Levy measure given by 

(7) iy(ds,dx) = ^^^dxX(s)ds, s>0,0<x<l, 

X 

where Jq gt{x)dx = 1 for all t £ [0,r] and that X{t) is bounded and positive 
on (0,r). 
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Remark. Comparing (3) and (7), we can see that 

X{s)ds= [ [ xfsix)dxds = E{A{t)) 
Jo Jo 

and gtix) =xft{x)/\{t) provided \{t) > 0. 

Remark. Positiveness of X{t) on t G (0, r) is necessary for the Bernstein- 
von Mises theorem. Suppose X{t) = for t £ [c,d] where < c < d < t. 
Then both the prior and posterior put mass 1 on the set of c.h.f.'s, A with 
A{d) = A{c). 

For the Bernstein- von Mises theorem, we need the following two condi- 
tions: 

(CI) There exists a positive number such that 

sup (1 — x)^~'^(7((x) < CXD. 

te[o,T],x£[o,i] 

(C2) There exists a function k{t) defined on [0, r] such that for some a > 1/2 
and e>0 



sup 

te[o,T],he[o,£] 



gt{h)-k{t) 



< oo 



and 



< inf k{t) < sup k{t) < oo. 
*e[o,r] te\o,T] 



te[o,- 

Throughout this paper, we let 

g* = sup (1- xf^'^gtix), 
te[o,T],xe[o,i] 

K = inft6[o,r] H^) and k* = sup^gjo^^j k{t). 

Conditions (CI) with ? = and (C2) are used for the Bernstein-von Mises 
theorem of the survival function without covariates by Kim and Lee [16]. 
The most delicate part of the proof in this paper is to show that the tail 
probability of the posterior distribution of [3 converges to sufficiently fast. 
The positiveness of ? plays an important role for this. 

The following theorems are the main results of this paper. We first state 
the result, an interesting result in its own right, that the marginal posterior 
density of /3 converges to a normal density in the Li norm. This is stronger 
than the usual Bernstein-von Mises theorem, which states that the posterior 
converges weakly to a normal distribution in probability, because our result 
states that the posterior density converges to a normal density in the Li 
norm with probability 1. 
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Theorem 3.1. Under conditions (CI) and (C2), 

(8) lim ||/„-<^||=0 

n — ^oo 

with probability 1, where fn is the posterior density of \/n(/3 — P), (j) is the 
normal density with mean and variance I{(3q)~^, and \\ • || is the Li norm. 

The next theorem states that the conditional distribution of \'^{A — A) 
given (3 and data converges to a Gaussian process. 

Theorem 3.2. Under conditions (CI) and (C2), 

£(V^(A(-) - i(-))| V^(/3 - /3) = x,Dn) ^ WiUoi-)) - xeo{-) 

on D[0,t] with probability 1, where W is standard Brownian motion. Here, 
L>[0,r] is the space of right- continuous functions on [0,t] with left limits 
existing on [0, r] equipped with the uniform topology. 

The proofs of Theorems 3.1 and 3.2 are presented in Section 4. Combining 
Theorems 3.1 and 3.2, we can prove the main theorem stated below. 

Theorem 3.3. Under conditions (CI) and (C2), 

(9) C{^{^A{.)-A{.),P-f3)\Dn) ^ {W{Uo{-))-Xe^{-),X) 

as n —> oo on Z)[0,r] x RP where X is a multivariate normal random vector 
with mean and variance I~^{(3q) and W is standard Brownian motion 
independent of X . 

Proof. Theorems 3.1 and 3.2 prove the convergence of the marginal 
posterior distribution of (3 and the conditional posterior distribution of A 
given /3. To prove the convergence of the joint posterior distribution of f3 and 
A, note that Theorem 3.1 implies the strong convergence of the marginal 
posterior distribution of \/n{(3 — (3) to the distribution of X. Applying The- 
orem 2 of [21], we complete the proof. □ 

Remark. It should be noted that the limiting distribution (9) is the 
same as that of the maximum likelihood estimators centered at the true 
values. 

Remark. From (9), we can see that marginally the posterior distribu- 
tions of y/n{f3 — (3) and \/n{A — A) converge weakly to a normal distribution 
and a Gaussian process, respectively. 
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In the following examples, we show that most popular prior processes 
such as beta processes and gamma processes satisfy condition (CI). For 
condition (C2), see [16]. 

Example 1 (Beta process). The beta process with mean A and scale 
parameter c is an Nil process with Levy measure v given by 

v(dt,dx) = ''^(\-xf^^^-^ dxdkit). 

X 

Suppose that A(t) is absolutely continuous with A(t) = dh.{t)/dt. Then gt{x) = 
c{t){l — x)"^^*-*"^. If inftg[o^T-] c(t) > and sup^g^ c{t) < oo, condition (CI) 
holds with <; = infjg[o,T] c(i). 

Example 2 (Gamma process). A priori, assume that Y(t) = — log(l — 
F{t)) is a gamma process with parameters (A(t), c(t)) with A(t) = /q A(s) dx, 
where X{t) is a positive bounded function on t G (0,t). Furthermore, as- 
sume that c{t) is continuous around t = and < inf^gjQ c(t)(= c*) < 
supte[o,r] c(t)(= c*) < oo. Here, the gamma process with parameters (A(t), c(t)) 
is defined by 

yit)= f^^dXis), 
Jo c{s) 

where X{t) is an Nil process whose marginal distribution of X{t) is a gamma 
distribution with parameters (/q c(s) dA{s), 1). For details of this definition, 
see [19]. This prior process was used by Doksum [5], Ferguson and Phadia 
[6] and Kalbfleisch [11]. Since 

logE(exp(-0y(t)))= /* /°°(e-^^-l)^exp(-c(s)rc)dxdA(s), 
JO Jo X 

it can be shown that the c.h.f. yl of is an Nil process with Levy measure 
given by 

u(ds, dx) = c(s)—^ -(1 - x)^W-^ dx dA(s), 

— log(l — X) 

where 

^(*) = (I^W3^''-^'*'"''^)" 

and 

m= f'Jf\dA{s). 

Jo c{s) 

Therefore, we have 

9t{x) = c(t)^^ r(l - x)^W-i, < X < 1. 

-log(l -X) 
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Now, 



gt{x)=c{t)-^^^^^(X-x)^^ 



where 



< ( sup c(^)V(l-^)'^*^"'*^^"^ 

Vte[0,r] / 

x{\ — xY*/"^ 



m = sup 



tG[0,r] -log(l-2;)' 

It is easy to show that sup^g^ .^j c{t) < oo and so condition (CI) follows with 
?<c*/2. 

4. Proof of the main results. For a given sequence of random variables 
Zn, we write Z„ = 0{n^) with probability 1 if there exists a constant M > 
such that Zn/n^ < M for all but finitely many n with probability 1. Also, we 
write Zn = o{n^) with probability 1 if Z^jn^ converges to with probability 
1. For a given finite-dimensional array of real numbers C, ||C|| is defined as 
the sum of all the absolute values of the elements of C . 

Let d{i) be the integer such that T^(j) = ti and = 1. Note that since 
we assume that the true distribution is continuous there is no tie among 
the uncensored observations and so d{i) is well defined. 

4.1. Proof of Theorem 3.1. Let 

/in (/?)=-Pn (/?)+£ 
1=1 

Then we have 

7r(/?|L»„)ocexp(/i„(/3))^(/3), 

and so the posterior density of \/n{P — j3) becomes fn{h) = gn{h)/Cn where 

gn{h) = exp(/i„(/3 + h/^) - hn{f3)MP + h/^^) 

and Cn = J^p gn{h) dh. Hence, the proof of Theorem 3.1 will be completed 
if we prove that 









dx^ 







(10) / \gnih)-i^ih)7riPo)\dh^O 

J BP 

with probability 1, where 

V^(/i)=exp(-/i'^/(/3o)V2). 
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Let ^n(/3) =logL„(/3) and define 
[(/3)=/3^E(Zi/(5i = l)) 

- /^og(E(e'3^^i/(ri > i)))E(e^<T^^/(ri > t))dAo{t). 
Jo 

It is not liard to see tliat /(/?) is strictly concave with attainment of its 
maximum at Pq. Hence, 

(11) sup I /„(/3)/n -[(/?) 1^0 

/3GB 

with probability 1 for any compact subset B of W. 

We recall the following properties of (5 and IniP) from [24] or [15]. First, 

P is consistent (i.e., /3 — > /3o with probability 1). Let be the A;th 

(2) 

derivative of ln{/3) in /?. Then — /„ ' {(3)/n — > I{(5q) with probability 1. Also, 

sup^g^ ||/n (/3)|| = 0(n) with probability 1 for any compact subset B of BP. 
We need the following two lemmas whose proofs are in the Appendix. 



Lemma 1. For any compact subset B of R^, 

snp-\\hL'HP)-i'\/3)\\=o{l) 

with probability 1 for k = 0,1,2,3. 
Lemma 2. 

with probability 1. 

We decompose (10) by 



(12) / \gn{h)-i^{h)7r{Po)\dh< \gn{h)-i;{h)7T{Po)\dh 
JRP J\h\<K 



(13) + / i:{h)TT{(5Q)dh 

J\h\>K 



(14) + / gn{h)dh 

JK<\h\<^5 



(15) + / gn{h)dh. 

J\h\>^/n8 

We will show that for given e > 0, there exist positive constants K and 5 
such that the four terms become smaller than e for all sufficiently large n. 
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For (12), we will exploit the standard techniques used for the proof of 
the Bernstein-von Mises theorem for parametric models. First, using Taylor 
expansion, we write 

log(5„(/i)) = - h^(--h(^\p))h + Rr.ih) 

(16) 

+ log(7r(/J + /i/\/^)), 



(18) 
(19) 



(k) 

where hn is the kth derivative of hn in /?. Lemma 2 implies, for all h, 
(17) JLh(^)0)^o. 
Lemma 1 with the properties of /n.(/9) yields, for all h, 

\Rnih)\^0. 
Also, we have 

(20) log(^(/3 + VV^))^vr(/?o) 

uniformly on {\h\ < K} with probability 1 for any K > 0. Now, 

\gn{h)-i^{h)Tr{Po)\ 

< \gn{h)-^j{h)7T0 + h/V^)\ + \^{h)7r0 + h/,/^)-^{h)7r{(3o)\ 

< V(/i)vr(/3 + /i/\/^) 



exp h 



h^n\p) 1 



n 



h^f-h^^)0)-i{f3o)y + R^{h))-i 



Tl0 + h/^) 



vr(/3o) 

By (17)-(20), we get, for any K > 0, 



sup \gn{h) - V'(/i)vr(/?o)| 

\h\<K 



with probability 1, and thus 



\gn{h)-^j{h)^{l3o)\dh^Q 

\h\<K 

with probability 1. 

We can make (13) as small as possible by choosing K sufficiently large. 
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As for (14), note that by Lemma 1 with the property of IniP), there exists 
a constant M such that sup|^|<^ || /in < M for sufficiently large n. 

Hence, we can write 



(21) 



Rn{h)< 

i,j,k=l 



n 



- 6 



for some [3 in between and [3. Let 77 > be the smallest eigenvalue of 
/(/?o)- Since —hn {(3) /n ^ I{Po) with probability 1, we have 

(22) h'^i-h^^\p)/n)h >iv- oim'^h. 

Also, when \h\ > 1, 



(23) 



h 



n 



<\h\ 



n 



with probability 1. Now, combining (21), (22) and (23), we have 

KCP + h/^) - hn0) < -h^h{r]/2 - p^5M/Q + o(l)) 

when > 1 for all sufficiently large n with probability 1. Set 5 sufficiently 
small that r//2 — p^5M /Q{= k) > and sup|^_^|^|<2^ vr(/3)(= g) < 00. Then 



K<\h\^5 



gn{h) dh < 



K<\h\<^5 



exp(— 



<Q 



K<\h\<^S 



exp(-|/ipK/2) dh 



for all sufficiently large n with probability 1. Hence, we can make (14) as 
small as possible by choosing a sufficiently large K. 

For (15), let il^ix) = ^^{1 - (1 - yY~~^)/ydy. Then it can be shown that 
sup^<a;<oo xil)'{x) =tp*<oo where tp'{x) = dip{x)/dx. Note that 



/in(/?)<Elogfn / 
i=l ^ ^0 



1 1 _ (1 _ ^y^pW'^ Zd{i)) 



X 



X (1 -x)^^^<('»)'"^^^^^^^(7t,(x)(ix 
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x{l-x) 



K-1 



(24) 
(25) 



<J2log g*nUi exp(/?^Z,)+d 

-A exp(/3^Z,)+<^ 

^V^1 ^*/* exp(/?^Z,(,)) 

2=1 ^ 

<Cgn + ^log 



exp(^^Zrf(i)) 



exp(/?^Zj 



where C = log ((7*7/^*). Here the inequality in (24) follows from 



Y exp(/3^Z,) + d Y exp(/3^Z,)+d 

= exp(/3^Zrf(j))V''(a) 
_ exp(/?^Zrf(j)) 



< 



exp(/3^Z, 



0-0 (o) 
'<i(j)) 



r, 



where ffl is a positive number between Ej^-j^+^j ^ exp(/3"^^j) + ^ and 



Let 



C(/3) = Eiog 



n 



exp(/3'^Zrf(j)) 



Note that R^{ti) are nonempty sets and so l^iP) is well defined for all 
sufficiently large n with probability 1. Also, by direct calculation we can see 
that IniP) is a strictly concave function. Since sup^g^ Kn(/3) — = ^(1) 

for any compact subset B of RP, we have sup^g^ \ l^{P)/n — — > 0. Now, 
choose m such that 

sup {l{P)-l{Po))<-qC-v 

/3:\P-/3o\=m 
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for some > where q = Pr{Ti < t,Si = 1} . Then 

sup ^ilM< sup g^jp [(/?) + o(l). 

0:\0-/3o\>m l3:\l3-l3o\=m ^ /3;|/3-A|=m 

Since qn/n ^ q and hn{l3)/n — > 1{I3q) with probabihty 1 by Lemma 1, we 
have 

sup l:^{P)/n- hniP)/n 

/3:|/3-/3o|><5/2 

< sup (/+(/3)/n-/i„(/3)/n) 

f3:5/2<\/3-l3o\<m 

+ sup (/+(/?)/n-/i„(/3)/n) 

/3:|/3-/3o|>m 



-qC-ri + o{l). 



Finally, 



\h\>y/nS "'|/3-/3|>(5 
/3:|/3-/3|>5 



vp/'^ exp 



n[qnC/n+ sup /+(/3)/n - ) 

. V /3:l/3-0nl>5/2 / 



/3:|/3-/3o|>5/2 

<nP/^ exp[n(-r/ + o(l))] 
< nP/2e-"'?/2 ^ 
for all sufficiently large n with probability 1 and the proof is done. 

4.2. Proof of Theorem 3.2. Let On = {y/^{(3 -f3)=h, be given. We 
decompose ^/n{A{■) — A{-)) by 

(26) ^^A{-)-A{-)) = V^{A{-)-A'{-)) 

(27) +V^(A^(.)_i(.)) 

(28) +MA{-)-M-)) 

(29) +V^{Ahi-)-A{-)), 
where 

^•^(t)=^AA(ti)I(t,<t), 
j=i 

Ait) = EiA'{t)\9n) 



dN.{u) 
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and 

Ait) 

'o EieR„wexp(/3;(Zj) 

with (3h = (3 + h/^/n and N.{t) = Ya=i I{Ti <t,6i = 1). Then we wih prove 
that when On is given, with probabihty 1, (26) and (28) converge to weakly, 
(27) converges to W{Uo{-)) weakly, and (29) converges to ^eo(-) on -D[0,r] 
with probability 1. Then Slutsky's theorem completes the proof. 

For (26), let Ph = P + h/^/^ and Yn{t) = A{t) - A^{t). Then Theorem 2.1 
yields that conditional on Yn{t) is an Nil process with Levy measure vy^ 
given by 

UY {dt,dx) = (1 -x)^^e«„w«^P('5l'^i)ff*MdxA(t)(it. 

X 

Since y„ is nondecreasing, supjgjQ,,-] \ \/nYn{t)\ = \/nYn{T) and so it suffices 

to show that C{y/nYn{T)\9n) — > with probability 1, which is equivalent to 
Pr{|i/ny„(r)| > e\On} with probability 1 for any e > 0. By the Cheby- 
shev inequality, we have 



1 



Pr{|^y„(T)| > e\en} < ^{{V^E{Yn{T)\9n)r + n Var(y„(r)|0„)). 
Let 

r [\^'-\l-x)^^^i^^^M'''''^^^^^''^gs{x)dxX{s)ds. 

Jo JO 



n 



Then 



g* r /'x^-l(l-x)^ie«„M-P(/3.^^.)+^-l^^;^(,)rf, 
Jo Jo 



Since ^/nE(Yn{T)\6n) = ct>i and n Var(y„(r)|0„) = (j)2, (26) converges to on 
Z)[0,r] with probability 1. 

For (27)-(29), we need the following lemma, whose proof is in the Appendix. 

Lemma 3. For any compact subset B of BP and any positive integer k, 



sup 

PeB,l<i<q„ 



EiAAHtmD.^ .!r(E,..(,)exp(/.-Z,) + l) 



r(E,6if(t,)exp(r^i) + fc + i) 

= o(n-('=+i/2)) 
with probability 1. 
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For (27), since conditional on 0„ the process ^/n{A^^ — A) is an indepen- 
dent increment process, we utilize Theorem 19 of Section V.4 in [20]. Let 
Yn = ^Jni^A^ — A). We first prove the convergence of the finite-dimensional 
distribution by showing Lyapounov's condition. Suppose < s < i < r are 
given. Note that 

S<ti<t 



Lemma 3 implies 



sup E[{V^{AA{ti) - AA{U))t\en] = 0{n-\ 

i=l,...,q„ 

with probability 1. Hence 

(30) J2 mM^MU) - Ai(ti)))>„] = r 0(n-2) dN.{u) 

S<ti<t 

with probability 1. Similarly, we have that 
YaT{Yn{t)-Yn{s)\9n) 

(31) = mV^i^MU) - AA{U))f\9n} 

S<ti<t 

-(l + o(n-i/2)). V ) 



EjeRn(«) exp(/3^Zj) Ei6i?„(n) exp(/?^Zj) 

with probability 1. Hence, Lemma A2 in [24] yields 

(32) Var(K„(t) - y„(s)|^„) ^ Uo{t) - Uo{s) 

with probability 1. Now (30) and (32) imply the finite-dimensional distribu- 
tions of Yn converge to those of W{U()) weakly. Finally, note that 

Pr{|y„(0 - Yn{s)\ > e\en} < ^ Var(y„(t) - Yn{s)\en). 

By (32), we have 

Var(y„(t) - K„(s)|0„) = Uo{t) - Uo{s) + o(l) 

with probability 1. Since Uo{t) is continuous, with probability 1 we can make 
Pr{|yre(t) — l^(s)| > s\On} as small as possible for all sufficiently large n by 
choosing t and s sufficiently close. Hence Theorem 19 of Section V.4 in [20] 
allows us to conclude that C{Yn\9n) converges weakly to W{Uo) on D[0,r] 
with probability 1. 

For (28), Lemma 3 yields that 

^^(*^) = "-JWTTT^ + 
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Therefore 

sup \V^{Ait) - Ah{t))\ 

te[o,T] 



^jeRAu) exp(/3;l ^j) + 1 



\jeRniu) ) 



0(1) 



dNn(u) 



dN.{u) 



EjeRn{u)ewiPh^j] 







with probabihty 1 by Lemma A. 2 of [24]. 

Finahy, the proof of (29) converging to can be found in the proof of 
Theorem 3 in [15]. 

APPENDIX: PROOF OF LEMMAS IN SECTION 4 
Lemma A.l. Let 

(l-(l-x)-P(^"^^(')))g,.(x)(l-x) 
k{U)x 

-exp(/3^Z,(,))(l-x)-P('^"^''«) 

and let i{^\x,f3) he the kth derivative of i]i{x,f3) in (5. Let a' = min{l,Q!} 
where a is in condition (C2). Then for any compact subset B of RP , there 
exist constants M^, A: = 0,1,2,3, such that 



(33) 



sup 

/3GB,a:G{0,l),l<i<<j„ 



(34) 

with probability 1. 
Proof. Write 

(35) r]i{x,P) = [(^(x,/3,Zrf(j)) -exp(/3'^Zrf(i))] 



gu{x){l-x) 



(36) 

(37) 
where 



k{ti) 

^ ii-x).M^^z,,,) ^^^^^^^_^^^ 

+ exp(/3^Z,(,))[(l -x)-{l- x)-P(/5"^<'(0)] 
1 _ n _ 7,^exp{/3^z) 
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For (35), let h{x:(3,Z) = (j){x,(3,Z) — ex.p{f3^Z). Then direct calculation 
yields that 

mi= sup \h' {x : 13 , Z)\ < oo 

/3G-B,a;G{0,l/2),||Z||<A4 

where h'{x:(3,Z) = dh{x:j3,Z)/dx. Now, since h{Q : (3,Z) = 0, the mean 
value theorem implies that 

h{x:/3,Z) 



(38) 



Hence 



sup 

/3&B,x&(0,l/2),\\Z\\<Ah 



sup 

f3£B,xe{0,l/2),\\Z\\<M^ 



h{x:(3,Z)-h{0:p,Z) 



X 



l-a' 



sup \h' [x: (3,Z)\. 

/3eB,xe{0,l/2),|lZ|l<M^ 



sup 

/3eB,a;G(0,l/2) 



(35) 



9 



Also, it is easy to see that 



sup 

/3eB,xe[i/2,i) 



(35) 



< D 



for some constant D, since the numerator as well as the denominator is 
finite. 

For (36), conditions (CI) and (C2) imply that 

{l-x){gt{x)-k{t)) 



So 



m2 = sup 

/3GB,a;e(0,l),te[0,T] 



sup 

/3GB,xe(0,l) 



< oo. 



(36) 



<— m2 



where Cb = suvp(:B,\\z\\<Ah ^wiP^Z). 

For (37), let k{x : P, Z) = {1 - x) - {I - xf^'P'^'^^^^ and let k'{x : (3, Z) be 
the first derivative of in x. Direct calculation yields 

sup \k'{x : j3,Z)\ <oo. 

f3eB,x&{0,l/2],\\Z\\<M^ 

So we can use a method similar to (38) to show 

k{x:l3,Z) 



7713 = sup 

/3eB,xe(0,l/2],||Z||<M^ 



< oo. 
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Also, it is true that 

7714 : 

Hence 



sup 

l3(iB,x&{l/2,l],\\Z\\<Ah 



k{x:(5,Z) 



< oo. 



(37) 



Now the proof of (34) for /c = is done by letting 

Mo = mig* /k^ + D + m2CB/K + CBims + m^). 
The results for k = 1,2,3 follow from similar arguments. □ 

Proof of Lemma 1. We can write 

hniP) = -PnW) + UP) + Eiog(i + cm/MP)), 



1=1 



where 



exp(/3^Zrf(i) 



SiGi?(i,)exp(/3^^j) 



and 



am 



Jo X ' k{ti) 



dx 



For Pn{P), using conditions (A2) and (CI), we have 

(39) suppn(/?) <MV / / {l-x)^''-'^^+'-UxX{t)dt = 0(logn) 

with probability 1 for some positive constants M and c. Similarly, we can 
show that sup^g^ ||/o„(/?)^'^^ II = 0(logn) with probability 1 for A; = 1,2, 3. 
Let 

(40) Uf3) = Ci{(3)/xi{P)- 

The proof will be complete if we show that supp^B,i=i,...,qn Mi''\P)\\ — ^i^) 
for k = 0, 1,2,3. However, we will show sup^gg j^j^^ ll'?i''''(/?)ll = o{n~^^'^) 
to use it in the proof of Lemma 2. Since supij^B,i=i,...,qn \\Xi'\P)\\ = 0{n~^) 
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for A; — 0,1,2,3, it suffices to show sup^g^ j^;^^ 
A; = 0,1, 2, 3. 

For A; = 0, let o! = min{a, 1} where a is in condition (C2). Since 



exp(/3'^2'rf(^)) 







X exp(/3^Z^(i))(l - a;)'=^P(^^^d(')) dx, 



we can write 



a(3)= /\l-x)^^^«^('.)''"^''"^^^^ \^{x,f3)dx 



where r/ is defined in (33). Then Lemma A.l yields 

sup iic.(/?)ii< r(i-x)^^-^(^)°^^^^'^^-^-^ 



(41) 



f3eB,l<i<g„ 







x° Modx 



0{n 



) = o(n-3/2). 



where the last equality is due to the fact that a' > 1/2 by condition (C2). 
Similarly, we can get sup^g^ j^^^ ^^^^^ ||Cf)(/3)|| =o(n-3/2) for A: = l,2,3. □ 



Proof of Lemma 2. We have 

(42) \\h^r^HP)\\<\\p'i^\m + 



+E 

1=1 



where Ci(/?) is defined in (40). We have shown in the proof of Lemma 1 that 
O(logn) and 

9n 



E 

i=l 



l + e.(/3) 



with probability 1. Since /« (/?) = 0, the proof is done. □ 
Proof of Lemma 3. Let = {13, Dn). Let 

k/ax fcl-(l-a;)''^P(^'^^''W^, NE-.P+r. .cxp(/3Tz,) , 

ef(/3)=/ i ^ (1 -x)'^J6-f*+(*«) ^^'^ "gt\x)dx 

Jo X 



and 



ei{P)=k{ti)exp{(3 Zd(i))- 



r(E,6ii(t,)exp(rZ,) + A; + l)- 
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Since 



using Lemma A.l, we have 
sup|ef(/3)-et(/3)| 

P&B 



< k* sup 
<k*Mo 



(1 _ 3;)^je-Rj(ti) 



0(n^('=+"'+i)) = o(n^('=+3/2)) 



with probabiUty 1. 
Now, we can write 



sup 



^!r(E,6ij(t,)exp(/3^Z,) + l) 



r(E,gij(t,)exp(/?^^,) + fc + i) 



(43) 



: sup 

P&B 



< sup 

P&B 



em em 

ef(/?)-et(/3) 



em 



sup 

/3GB 



et(/?)(e?(/3)-e°(/3)) 



Note that (/?) = 0(n^(''+^)) and hence (/?) = 0(n^('^+i)). Therefore, 



(43) 



o{n 

(fc+3/2)) Q(^-(fc+l)^^(^-3/2) 



+ 



O(n-i) ' 0(n-2) 
with probabiUty 1, and the proof is done. □ 



on 



-(fc+l/2)^ 
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